Computerization of African languages-French dictionaries
نویسندگان
چکیده
This paper relates work done during the DiLAF project. It consists in converting 5 bilingual African language-French dictionaries originally in Word format into XML following the LMF model. The languages processed are Bambara, Hausa, Kanuri, Tamajaq and Songhai-zarma, still considered as under-resourced languages concerning Natural Language Processing tools. Once converted, the dictionaries are available online on the Jibiki platform for lookup and modification. The DiLAF project is first presented. A description of each dictionary follows. Then, the conversion methodology from .doc format to XML files is presented. A specific point on the usage of Unicode follows. Then, each step of the conversion into XML and LMF is detailed. The last part presents the Jibiki lexical resources management platform used for the project.
منابع مشابه
Vers l'informatisation de quelques langues d'Afrique de l'Ouest (Towards the computerization of some west-african languages) [in French]
Chantal Enguehard1 Soumana Kané2 Mathieu Mangeot3 Issouf Modi4 Mamadou Lamine Sanogo5 (1) LINA2, rue de la Houssinière, BP 92208, 44322 Nantes Cedex 03, France (2) CNR-ENF, BP 62, Bamako, Mali (3) LIG,BP 53 38041 Grenoble, France (4) MEN/A/PLN/DGPLN/DREL, BP 557, Niamey, Niger (5) CNRST, BP 7047 Ouagadougou 03, Burkina Faso [email protected], [email protected], Mathieu.Mangeot@i...
متن کاملAutomatic Diacritic Restoration for Resource-Scarce Languages
The orthography of many resource-scarce languages includes diacritically marked characters. Falling outside the scope of the standard Latin encoding, these characters are often represented in digital language resources as their unmarked equivalents. This renders corpus compilation more difficult, as these languages typically do not have the benefit of large electronic dictionaries to perform di...
متن کاملConversion of Lexicon - Grammar tables to LMF : application to French 1
In this chapter, we describe the first experiment of conversion of Lexicon-Grammar tables for French verbs into the LMF format. The Lexicon-Grammar of the French language is currently one of the major sources of lexical and syntactic information for French. Its conversion into an interoperable representation format according to the LMF standard makes it usable in different contexts, thus contri...
متن کاملOn multiword lexical units and their role in maritime dictionaries
Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...
متن کاملCombining Corpus and Machine - ReadableDictionary Data for Building Bilingual
This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base, presenting a methodology to integrate information from both sources in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1405.5893 شماره
صفحات -
تاریخ انتشار 2014